Assignment 1

Report for the first assignment of Effective MLOps: Model Development course.
Created on March 24|Last edited on April 2
Comment
Content
Problem and DatasetEDA Raw DataData ProcessingNaïve Baseline Baseline Model
﻿
Problem and DatasetThe problem I picked is an ordinal regression task, namely predicting review rating for books on Goodreads (Kaggle competition)
The data is composed of approximately 0.9M book reviews, containing the book, the author, the review text, and its stats. For a full description, check out the information on Kaggle﻿
EDA Raw Data﻿
project("d-a-pop", "Goodreads Books Reviews").artifact("Raw_data").membershipForAlias("96e7344e96997b82377c").artifactVersion.file("table_train.table.json")
 - 5 of 200000
user_id
book_id
review_id
rating
review_text
date_added
date_updated
read_at
started_at
n_votes
n_comments
1
2
3
4
5
The book_id feature is ordinal instead of categorical
The date columns (i.e. date_added, date_updated, read_at, started_at) are strings instead of datetime
The features read_at and started at have many missing entries
﻿
project("d-a-pop", "Goodreads Books Reviews").artifact("Raw_data").membershipForAlias("96e7344e96997b82377c").artifactVersion.file("table_train.table.json")
 - 5 of 6
count
avg(n_votes)
n_comments
1
4
2
5
3
3
4
2
5
0
rating
Rating 4 is the most common => the naive baseline is setting every review's rating to 4 and computing the F1 score
People tend to leave a comment when the review is either really bad or really good
Data ProcessingAbsolute values for votes and comments (cannot be negative)
Fill in read_at missing values with added_at
Convert date strings to pandas datetime
Fill in missing values with the mode of the respective column
Derive additional features (missing_started_at, reading_duration, review_length, spoiler, hour, month, dayofweek, year, ...)
Drop review text
Convert id features to category type
Naïve Baseline Set all ratings in the validation set to 4 
F1 score - 0.08
Baseline ModelLightGBM Classifier (default parameters)
Below is the key metric (F1 score), training charts, predictions and feature importance tables.
Conclusions:
F1 score improve drastically in comparison with the naïve baseline, even though the model was trained using the default hyperparameters
Looking at the evolution of the F1 score and multi logloss over the iterations, one can see that they did not reach a plateau yet. Hence, increasing the number of iterations would likely increase the performance of the model
The feature importance bar plot, suggests that the book and user id are the most important, implying that certain users and books are reviewed either really well or really poorly
F1 score - 0.38
﻿
﻿
﻿
Code available on GitHub﻿
💡
﻿
Add a comment